Pattern feature selection method, classification method, judgment method, program, and device

ABSTRACT

Feature decision means ( 303 ) decides a set of features appropriate for pattern identification from a plenty of feature candidates generated by feature candidate generation means ( 302 ) by using learning patterns stored in learning, pattern storage means ( 301 ). The feature decision means ( 303 ) successively decides features according to a reference of information maximization under the condition that the decided feature is known while adding an effective noise to the learning pattern and performs information amount calculation approximately and at a high speed while merging the learning patterns into a set of N elements when required. As a result, it is possible to automatically create a feature set appropriate for pattern identification of a high performance without requiring enormous learning. Moreover, by using a transition table ( 305 ) containing transitions between sets, it is possible to perform pattern judgment with a high efficiency.

TECHNICAL FIELD

The present invention relates to a feature selection method, aclassification method, a judgment method, a program, and a device thatare used for a pattern used for image identification or the like.

BACKGROUND ART

A method on the basis of a discriminant analysis method, a method on thebasis of a principal component analysis method, and so forth, have beenknown and widely used, as a method for deciding a feature used foridentifying a pattern from a learning-pattern set (see JapaneseUnexamined Patent Application Publication Nos. 9-258492, 4-256087, and1-321591, for example). However, since every feature decided accordingto each of the above-described methods is linear, the performance of themethods is limited.

On the other hand, a method using a neural network and especially amethod using a combination of an error-backward-propagation learningmethod and the neural network are known, as a method using a nonlinearfeature (see “Neurocomputer” edited by Kaoru NAKAMURA, Gijutsu-HyohronCo., Ltd. 1989, or D. E. Rumelhart, “Parallel Distributed Processing”,MIT Press, 1986, for example). According to the above-described methods,it becomes possible to make a neuron of an intermediate layer learn thenonlinear feature suitable for pattern identification. However, theabove-described methods have the following problems. That is to say, ittakes enormous time for learning in the case of a problem of greatdifficulty and the performance of the neural net is significantlyaffected by the number of the intermediate-layer neurons. Further, thereis no general method for determining the most appropriate number of theintermediate-layer neurons in advance.

Further, methods including ID3, C4.5, and so forth, for determining aclassification rule at each node of a decision tree by themutual-information maximization standards are known as a method forperforming pattern identification (classification) by using the decisiontree (see Japanese Unexamined Patent Application Publication Nos.2000-122690 and 61-75486, J. R. Quinlan, “C4.5: Programs for MachineLearning”, 1993) The learning time required for these methods is shorterthan in the case of the above-described methods using the neuralnetwork. On the other hand, there is a perception that the performanceof these methods is inferior to that of the neural-network methods, ingeneral. For example, the ability to identify a pattern other than thelearning patterns (generalization ability) of these methods is not atadequate levels.

DISCLOSURE OF INVENTION

Accordingly, the object of the present invention is to provide a patternjudgment method for achieving high-performance pattern identificationwithout requiring enormous learning, and a feature selection method anda pattern classification method that are provided, as a precondition forthe pattern judgment method.

According to a first aspect of the present invention, there is provideda feature selection method used for a system including learning-patternstorage means for storing and maintaining a learning pattern includingclass information, feature-candidate generation means for generating aplurality of feature candidates, and feature decision means for decidinga set of features suitable for pattern identification from among theplurality of feature candidates generated by the feature-candidategeneration means, wherein the feature decision means decides apredetermined feature candidate having a largest amount of mutualinformation from the class information of the set of the learningpatterns, as a first feature of the feature set, by calculating afeature value of each of the learning patterns corresponding to thefeature candidates, predetermined feature candidates having a largestamount of mutual information from the class information of the set ofthe learning patterns under the condition that the decided features areknown, as next features of the feature set in sequence.

According to a modification of the first aspect of the presentinvention, there is provided a feature selection method used for asystem including learning-pattern storage mean, for storing andmaintaining a learning pattern including class information,feature-candidate generation means for generating a plurality of featurecandidates, and feature decision means for deciding a feature setsuitable for pattern identification from among the plurality of featurecandidates generated by the feature-candidate generation means, whereinthe feature decision means prepares a predetermined number of sets forcausing the learning patterns to transition according to the featurevalues, decides a predetermined feature candidate having a largestamount of mutual information from the class information of the set ofthe learning patterns, as a first feature of the feature set, bycalculating the feature value of each of the learning patternscorresponding to the feature candidates, adds weights to each of thelearning patterns according to the determined feature, distributes thelearning patterns, and causes the learning patterns to transition to aset corresponding to the determined feature in sequence, and decidespredetermined feature candidates having a largest amount of mutualinformation between the feature value of the learning patterncorresponding to the feature candidate and the class information of thelearning pattern in sequence, as next features of the feature set, underthe condition that information about the set including the learningpatterns and the decided feature are known.

According to a second aspect of the present invention, there is provideda method for classifying patterns used for a system includinglearning-pattern storage means for storing and maintaining the learningpatterns used for learning, feature-candidate generation means forgenerating a plurality of feature candidates, feature decision means fordeciding a feature set suitable for pattern identification from amongthe plurality of feature candidates generated by the feature-candidategeneration means, feature storage means for storing and maintaining thefeature set decided by the feature decision means, andclassification-table generation means for generating a classificationtable, wherein the classification-table generation means calculates afeature value of each of the learning patterns by using the feature setdecided according to the above-described selection method, andclassifies the learning patterns according to the classification tableincluding the feature values of the learning patterns and classinformation.

According to a third aspect of the present invention, there is provideda method for pattern judgment used for a system including pattern inputmeans for inputting patterns, feature extraction means for extractingfeatures from the patterns, pattern judgment means for judging thepatterns based on the features, and feature storage means for storingand maintaining a decided feature set, wherein the feature extractionmeans calculates a feature value of each of the input patterns by usingthe feature set decided according to the above-described featureselection methods, and performs the pattern judgment based on thecalculated result.

According to a modification of the third aspect of the presentinvention, there is provided a method for pattern judgment used for asystem including pattern input means for inputting patterns, featureextraction means for extracting features from the patterns, patternjudgment means for judging the patterns based on the features, andfeature storage means for storing and maintaining a decided feature set,wherein the feature extraction means calculates a feature probabilityfor each of the input patterns, the feature probability indicating aprobability that the feature of a concerned rank becomes a predeterminedvalue by using the feature set determined according to theabove-described feature selection methods, wherein the pattern judgmentmeans causes the input patterns to transition based on the featureprobabilities of the input patterns and a transition table that storessets to which learning patterns belong at the time where each feature isdecided in sequence, and performs the pattern judgment by calculating aprobability that each of the input patterns has predetermined classinformation based on a route of the transition.

According to a fourth aspect of the present invention, there is provideda program run by a computer forming a system including learning-patternstorage means for storing and maintaining a learning pattern includingclass information, feature-candidate generation means for generating aplurality of feature candidates, and feature decision means for decidinga set of features suitable for pattern identification from among theplurality of feature candidates generated by the feature-candidategeneration means, wherein the program performs feature selection bymaking the computer perform the steps of deciding a predeterminedfeature candidate having a largest amount of mutual information from theclass information of the set of the learning patterns, as a firstfeature of the feature set, by calculating a feature value of each ofthe learning patterns corresponding to the feature candidates, anddeciding predetermined feature candidates having a largest amount ofmutual information between the feature value of the learning patterncorresponding to the feature candidate and the class information of thelearning pattern in sequence, as next features of the feature set, underthe condition that the decided features are known.

According to a modification of the fourth aspect of the presentinvention, there is provided a program run by a computer forming asystem including learning-pattern storage means for storing andmaintaining a learning pattern including class information,feature-candidate generation means for generating a plurality of featurecandidates, and feature decision means for deciding a feature setsuitable for pattern identification from among the plurality of featurecandidates generated by the feature-candidate generation means, whereinthe program performs feature selection by making the computer performthe steps of preparing a predetermined number of sets for causing thelearning patterns to transition according to the feature values,deciding a predetermined feature candidate having a largest amount ofmutual information from the class information of the set of the learningpatterns, as a first feature of the feature set, by calculating thefeature value of each of the learning patterns corresponding to thefeature candidates, adding a weight to each of the learning patternsaccording to the decided feature, distributing the learning patterns,and causing the learning patterns to transition to a set correspondingto the decided feature in sequence, and deciding predetermined featurecandidates having a largest amount of mutual information between thefeature value of the learning pattern corresponding to the featurecandidate and the class information of the learning pattern in sequence,as next features of the feature set, under the condition thatinformation about the set including the learning patterns and thedecided feature are known.

According to another modification of the fourth aspect of the presentinvention, there is provided a program run by a computer forming asystem including learning-pattern storage means for storing andmaintaining learning patterns used for learning, feature-candidategeneration means for generating a plurality of feature candidates,feature decision means for deciding a set of features suitable forpattern identification from among the plurality of feature candidatesgenerated by the feature-candidate generation means, feature storagemeans for storing and maintaining the feature set decided by the featuredecision means, and classification-table generation means for generatinga classification table, wherein the program performs learning-patternclassification by making the computer perform the steps of calculating afeature of each of the learning patterns by using the feature setdecided by running the above-described programs, and classifying thelearning patterns according to the classification table including thefeature values of the learning patterns and class information.

According to still another modification of the fourth aspect of thepresent invention, a program run by a computer forming a systemincluding pattern input means for inputting patterns, feature extractionmeans for extracting features from the patterns, pattern judgment meansfor judging the patterns based on the features, and feature storagemeans for storing and maintaining a decided feature set, wherein theprogram makes the computer perform a step of calculating a feature valueof each of the input patterns by using the feature set decided byrunning the above-described programs, whereby pattern judgment isperformed based on the performance result.

According to still another modification of the fourth aspect of thepresent invention, there is provided a program run by a computer forminga system including pattern input means for inputting patterns, featureextraction means for extracting features from the patterns, patternjudgment means for judging the patterns based on the features, andfeature storage means for storing and maintaining a decided feature set,wherein the program judges the input patterns based on a set to whichthe input patterns belong by making the computer perform the steps ofcalculating a feature value of each of the Input patterns by using thefeature set decided by running the above-described programs, and causingthe input patterns to transition based on the feature values of theinput patterns and a transition table that stores sets to which thelearning patterns belong at the time where each feature is decided insequence by running the program according to the modification of thefourth aspect.

According to still another modification of the fourth aspect of thepresent invention, there is provided a program run by a computer forminga system including pattern input means for inputting patterns, featureextraction means for extracting features from the patterns, patternjudgment means for judging the patterns based on the features, andfeature storage means for storing and maintaining a decided feature set,wherein the program makes the computer perform the steps of calculatinga feature probability for each of the input patterns, the featureprobability indicating a probability that the feature of a concernedrank becomes a predetermined value by using the feature set decided byrunning the above-described programs, causing the input patterns totransition based on the feature probabilities of the input patterns anda transition table that stores sets to which learning patterns belong atthe time where each feature is decided in sequence by running theprogram according to the modification of the fourth aspect, andcalculating a probability that each of the input patterns has apredetermined class information based on a route of the transition,whereby the input patterns are judged based on the calculation result.

According to a fifth aspect of the present invention, there is provideda pattern learning system for maintaining the above-described programsso that the programs can be run, so as to performlearning-pattern-feature selection.

According to a sixth aspect of the present invention, there is provideda pattern classification system for maintaining the above-describedprograms so that the programs can be run, so as to performlearning-pattern classification.

According to a seventh aspect of the present invention, there isprovided a pattern judgment system for maintaining the above-describedprograms so that the programs can be run, so as to perform input-patternjudgment.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a block diagram illustrating the configuration of a system forperforming a feature selection method according to a first embodiment ofthe present invention.

FIG. 2 is a flowchart illustrating the feature selection methodaccording to the first embodiment of the present invention.

FIG. 3 is a flowchart illustrating an example classification-tablegeneration process according to the present invention.

FIG. 4 shows an example classification table generated according to thepresent invention.

FIG. 5 is a block diagram showing the configuration of a system forperforming a pattern judgment method according to the first embodimentof the present invention.

FIG. 6 is a flowchart illustrating another example classification-tablegeneration process according to the present invention.

FIG. 7 is a block diagram illustrating the configuration of a system forperforming a feature selection method according to a third embodiment ofthe present invention.

FIG. 8 is a flowchart illustrating the feature selection methodaccording to the third embodiment of the present invention.

FIG. 9 is another flowchart illustrating the feature selection methodaccording to the third embodiment of the present invention.

FIG. 10 is another flowchart illustrating the feature selection methodaccording to the third embodiment of the present invention.

FIG. 11 is an example transition table generated according to thepresent invention.

FIG. 12 is another example transition table generated according to thepresent invention.

FIG. 13 schematically shows transitions between sets according to thethird embodiment of the present invention.

FIG. 14 is a block diagram illustrating the configuration of a systemfor performing a pattern judgment method according to the thirdembodiment of the present invention.

BEST MODES FOR CARRYING OUT THE INVENTION

Embodiments of the present invention will now be described withreference to the attached drawings.

FIG. 1 is a block diagram illustrating a system configuration for afeature selection method according to a first embodiment of the presentinvention. Referring to this drawing, learning-pattern storage means101, feature-candidate generation means 102, feature decision means 103,feature storage means 104, classification-table generation means 105,and a classification table 106 are shown.

The learning-pattern storage means 101 are used for storing andmaintaining a predetermined number of learning patterns used forlearning. The feature-candidate generation means 102 is used forgenerating feature candidates from a predetermined number of featureparameter sets in sequence. The feature decision means 103 decides afeature set that is most suitable for pattern identification from amongthe feature candidates generated by the feature-candidate generationmeans.

The feature storage means 104 is used for storing and maintaining thefeature set decided by the feature decision means 103. Theclassification-table generation means 105 is used for generating theclassification table 106 for performing pattern judgment by using thefeature set decided by the feature decision means 103.

Next, procedural steps of a feature selection method according to thepresent invention will be described with reference to the attacheddrawings. FIG. 2 is a flowchart illustrating the flow of processing forthe feature selection method of the present invention.

Referring to FIGS. 1 and 2, the feature-candidate generation means 102generates a feature candidate (step S001). More specifically, thefeature-candidate generation means 102 selects the s-th (where s=1 to Nand s starts from one) feature-parameter set (k_s, r₀ _(—) s, σ_s, andth_s) from among a plenty of (N) feature-parameter sets that had beenprepared and substitutes the s-th feature-parameter set into (k, r₀, σ,and th). Subsequently, a complex Gabor function Gab and a Gaussianfunction G shown in the following equations defined by parameters k, r₀,and σ are generated.

$\begin{matrix}\left. \begin{matrix}{{{Gab}\left( {{r;k},r_{0},\sigma} \right)} = {\exp\left( {{{\mathbb{i}}\;{k\left( {r - r_{0}} \right)}} - {{{r - r_{0}}}^{2}/\left( {2\;\sigma^{2}} \right)}} \right)}} \\{{G\left( {{r;r_{0}},\sigma} \right)} = {\exp\left( {{{- {{r - r_{0}}}^{2}}/\left( {2\;\sigma^{2}} \right)}/\left( {2\;\pi\;\sigma^{2}} \right)} \right)}}\end{matrix} \right\} & (1)\end{matrix}$

Here, an equation r =(x, y) indicates a position vector and an equationi²=−1 holds. The feature-candidate generation means 102 transmits thecomplex Gabor function Gab, the Gaussian function G, a thresholdparameter th, and a feature-candidate identification number s shown inEquations (1) to the feature decision means 103 (step S002).

The learning-pattern storage means 101 transmits a combination ofpredetermined M different learning patterns (image) f_(t)(r) (where t=1to M) and class q_(t) (where t=1 to M) to which each learning patternbelongs to the feature decision means 103 (step S003). In thisembodiment, the number of classes is determined to be two (q=0 or 1) forsimplifying the description. Of course, this method of the presentinvention can be used for the case where the number of classes is threeor more.

The feature decision means 103 calculates a feature c according to thefollowing equations by using feature candidates (the complex Gaborfunction, the Gaussian function, and other parameters shown in Equations(1)) for the learning patterns sequentially transmitted from thelearning-pattern storage means 101 (step S004). Here, the t-th learningpattern is determined to be f_(t)(r) and the above-described calculationis repeated for each of the entire (M) learning patterns.

$\begin{matrix}\left. \begin{matrix}{\alpha = {{{\sum\limits_{r}{{f_{t}(r)}{{Gab}\left( {{r;k},{r_{0}\sigma}} \right)}}}}^{2}/{\sum\limits_{r}{{f_{t}(r)}^{2}{G\left( {{r;r_{0}},\sigma} \right)}}}}} \\{c = {{1\mspace{14mu}{if}\mspace{14mu}\alpha} \geq {th}}} \\{c = {0\mspace{14mu}{otherwise}}}\end{matrix} \right\} & (2)\end{matrix}$

The denominator of an upper equation of Equations (2) indicates anormalization (standardization) factor for reducing variance of value adue to the pattern size (the image brightness). This equation of thedenominator can be replaced by a normalization factor in another form.Further, the above-described normalization factor can be omittedaccording to the type of a pattern to be handled.

Where the feature c is calculated for each learning pattern by using thes-th feature candidate (feature-parameter set) in the above-describedmanner, the feature decision means 103 then calculates themutual-information amount MI obtained from the s-th feature candidateaccording to the following equations. The feature decision means 103stores the calculated mutual-information amount MI with thefeature-candidate identification number s (step S005).

$\begin{matrix}\begin{matrix}{{{MI}\left\lbrack {Q;C} \right\rbrack} = {{H\lbrack Q\rbrack} - \left\langle {H\left\lbrack Q \middle| c \right\rbrack} \right\rangle_{c}}} \\{where} \\{{{H\lbrack Q\rbrack} = {- {\sum\limits_{q}{{P(q)}\log\;{P(q)}}}}},} \\{{P(q)} = {{M(q)}/M}} \\{{{H\left\lbrack Q \middle| c \right\rbrack} = {- {\sum\limits_{q}{{P\left( q \middle| c \right)}\log\;{P\left( q \middle| c \right)}}}}},} \\{{P\left( q \middle| c \right)} = {{M\left( {q,c} \right)}/{M(c)}}}\end{matrix} & (3)\end{matrix}$

Here, reference character Q indicates a set of classes {q=0, q=1}andreference character M indicates the total number of learning patterns.Further, reference characters M(q) indicate the total number of learningpatterns belonging to class q and reference characters M(c) indicate thetotal number of learning patterns whose features are represented by c,and reference characters M(q and c) indicates the total number oflearning patterns that have features represented by c and that belong toclass q.

<>_(c) shown in Equations (3) indicates an averaging operation relatingto c. That is to say, the following equations hold.

${\left\langle {H\left\lbrack Q \middle| c \right\rbrack} \right\rangle_{c} = {- {\sum\limits_{c}{{P(c)}{H\left\lbrack Q \middle| c \right\rbrack}}}}},{{P(c)} = {{M(c)}M}}$

According to the above-described operations, the next (s+1)-th featurecandidate is transmitted from the feature-candidate generation means,whereby operations similar to the above-described operations arerepeated (steps S002 to S005). Where the mutual-information amountcorresponding to the entire (N) feature candidates is calculated in theabove-describe manner, the feature decision means 103 compares themutual-information amount obtained from one of the feature candidates tothat obtained from another. Then, the feature decision means 103determines a feature candidate from which a largest mutual-informationamount (MaxMI [Q;C]) can be obtained to be a first feature of a featureset to be determined (step S006).

When the first feature is determined in the above-described manner, thefeature decision means 103 determines a second feature. The featuredecision means 103 receives feature candidates in sequence from thefeature-candidate generation means 102 in the above-described manner(step S002) and calculates the feature c for each learning pattern(steps S003 and S004). This operation can be replaced by the followingoperation. That is to say, the result of calculation for the feature cperformed at step S004 for determining the above-described first featureis stored and maintained according to the amount of available storage sothat the feature decision means 103 reads the details of the stored andmaintained data. When the feature c is calculated for each learningpattern by using the s-th feature-parameter set, the feature decisionmeans calculates information amount MI₂ obtained from the s-th featurecandidate according to the following equations and stores thecalculation result and the feature-candidate identification number sunder the condition that the first feature c₁ that had already beendetermined is known (step S005).

$\begin{matrix}\begin{matrix}{{M\;{I_{2}\left\lbrack {Q;\left. C \middle| C_{1} \right.} \right\rbrack}} = {\left\langle {H\left\lbrack Q \middle| c_{1} \right\rbrack} \right\rangle_{c\; 1} - \left\langle {H\left\lbrack Q \middle| \left( {c,c_{1}} \right) \right\rbrack} \right\rangle_{{C\; 1},C}}} \\{where} \\{{{H\left\lbrack Q \middle| c_{1} \right\rbrack} = {- {\sum\limits_{q}{{P\left( q \middle| c_{1} \right)}\log\;{P\left( q \middle| c_{1} \right)}}}}},} \\{{P\left( q \middle| c_{1} \right)} = {{M\left( {q,c_{1}} \right)}/{M\left( c_{1} \right)}}} \\{{{H\left\lbrack {\left. Q \middle| c \right.,c_{1}} \right\rbrack} = {- {\sum\limits_{q}{{P\left( {\left. q \middle| c \right.,c_{1}} \right)}\log\;{P\left( {\left. q \middle| c \right.,c_{1}} \right)}}}}},} \\{{P\left( {\left. q \middle| c \right.,c_{1}} \right)} = {{M\left( {q,c,c_{1}} \right)}/{M\left( {c,c_{1}} \right)}}}\end{matrix} & (4)\end{matrix}$

Here, reference characters (Mc₁) indicate the total number of learningpatterns whose first feature is represented by c₁. Reference charactersM(q, c₁) indicate the total number of learning patterns that have afirst feature represented by c, and that belong to class q. Referencecharacters M(c, c₁) indicate the total number of learning patterns thathave a feature represented by c and a first feature represented by c₁.Reference characters M(q, c, c₁) indicate the total number of learningpatterns that have a feature represented by c and a first featurerepresented by c₁, and that belong to class q.

According to the above-described operations, the next (s+1)-th featurecandidate is transmitted from the feature-candidate generation means,whereby the same operations are repeated (steps S002 to S005). Where themutual-information amount corresponding to the entire (N) featurecandidates is calculated in the above-describe manner, the featuredecision means 103 compares conditional mutual-information amount MI₂obtained from one of the feature candidates to that obtained fromanother. Then, the feature decision means 103 determines a featurecandidate from which a largest mutual-information amount can be obtainedto be a second feature of a feature set to be determined (step S006).

Subsequently, where the m-th feature is determined, the feature c of the(m+1)-th feature adopts a feature candidate that makes an evaluationfunction MI_(m+1) shown in the following equation rise to a maximumvalue.MI _(m+1) [Q;C|C ₁ ,C ₂ , . . . C _(m) ]=<H[Q|c ₁ , c ₂ , . . . c_(m)]>_((c) ₁ _(,c) ₂ _(, . . . c) _(m) ₎ −<H[Q|(c,c ₁ ,c ₂ , . . . c_(m)]>_((c) ₁ _(,c) ₂ _(, . . . c) _(m) _(,c))  (5)

The above-described function MI_(m+1) indicates the information amountobtained from the feature c under the condition that the features up tothe m-th feature (c1, C2, . . . C_(m)) are known. The above-describedprocedures are continuously performed until the amount of obtainedinformation (the amount of appended information) becomes smaller than aprepared threshold value MI_th even though a new feature is selected.For example, where the threshold value is determined to be zero, theabove-described procedures are repeated for determining the next featureuntil the value of the amount of obtained information (the amount ofappended information) becomes zero, that is to say, discontinuerequirements are fulfilled.

Where the discontinue requirements are fulfilled, the feature-decisionprocedures are finished. Each of parameters of the determined featureset is stored in the feature storage means 104 (step S007).

The following configuration for decreasing the number of featurecandidates generated by the feature-candidate generation means can beprovided, as a modification of the above-described feature selectionmethod. For example, for each complex Gabor function, an in-class meanvalue of values a calculated according to Equations (2) is determined inadvance for each of a class for which an equation q=0 holds and a classfor which an equation q=1 holds. Then, a threshold value (th_s) isfixed, so as to be an intermediate value of these two in-class meanvalues. Further, the following configuration can be proposed. That is tosay, where the mutual-information amount MI is calculated for eachcomplex Gabor function according to Equations (3) for determining thefirst feature, a threshold value (th_s) for giving a maximum MI value toeach feature candidate may be stored, and the threshold value (th_s) isfixed, as it is, in the case where the second feature and later aredetermined, for example.

Further, according to the above-described embodiment, the complex Gaborfunction is used, as a feature-extraction function forming the featurecandidate. However, other feature-extraction function can be added tothe complex labor function. Further, the feature candidate may includeonly other feature-extraction function, as required.

Further, the following modification is also preferable. That is to say,a subspace is generated for each class and an indicator for indicating adistance to the subspace is added to the feature candidate. Further, aweighted average luminance near a predetermined point, where theweighted average luminance is calculated by using a Gaussian function,or the weighted average luminance normalized by an average luminancecalculated by using a Gaussian function having a spread larger than thatof the above-described Gaussian function (that is, an indicator forindicating whether an area near the predetermined point is lighter ordarker than an area around the predetermined point) can be added to thefeature candidate. Otherwise, a standard feature used for patternjudgment can be added to the feature candidate.

Where the procedures for determining features are finished and thedetermined feature set is stored in the feature storage means 104, theclassification table 106 (see FIG. 4) used for the patternidentification can be generated. Procedures for generating theclassification table 106 performed by the classification-tablegeneration means 105 started by predetermined means will be described asbelow.

First, the classification-table generation means 105 receives learningpatterns from the learning-pattern storage means 101 and parameters ofthe feature set stored at the above-described step S007 from the featurestorage means 104. (In the following description, n features in totalare determined.) Then, the classification-table generation means 105calculates the value of each of features (c₁, c₂, . . . c_(n)) for eachlearning pattern according to Equations (2). Then, indicator 1 is addedto the feature candidate, where a predetermined learning pattern belongsto the class where the equation q=1 holds, or indicator 0 is added tothe feature candidate, where the predetermined learning pattern belongsto the class where the equation q=0 holds, and stored in theclassification table. A feature vector (c₁, c₂, . . . c_(n))corresponding to the learning pattern is also stored in theclassification table.

Although the above-described procedures allow for generating aclassification table used for uniquely classifying the learningpatterns, the use of a redundant item (don't-care item) is morepreferable. For example, where a predetermined pattern can be classifiedby using the values of from first to the i-th features (c₁, c₂, . . .c_(i)), the values of the i+1-st feature vector and later are replacedwith a don't-care sign and stored.

Procedures for generating a classification table by using theabove-described redundant item (don't-care item) will be described withreference to the attached drawing. FIG. 3 is a flowchart showing anexample process for generating the classification table according to theembodiment.

Referring to this drawing, first, the classification-table generationmeans 105 calculates the value of feature vectors (c₁, c₂, . . . c_(n))by using each of parameters of the determined feature set for each inputlearning pattern (steps S101 and S102).

It is determined whether or not a learning pattern having a featurevector that matches the above-described feature vector exists in theclassification table (step S103). However, where the don't-care sign iswritten in the classification table, the value of a featurecorresponding to the sign is determined to be the same as that of theabove-described feature vector, whatever the value may be.

Where the learning pattern having the feature vector that matches thecalculated feature vector exists in the classification table accordingto the above-described determination, the process returns to step S101without storing the learning pattern. Then, the next learning pattern isinput.

On the other hand, where no pattern having the feature vector thatmatches the calculated feature vector exists, increment variable i=1 isset (step S104) and the following process is performed. First, it isdetermined whether or not a learning pattern whose first to i-thfeatures (c₁, c₂, . . . c₁) match those of the learning patterns existsin the learning patterns belonging to a predetermined class(exemplification: q=1) other than the class (exemplification: q=0) towhich the above-described learning pattern belongs (step S105).

Where there is no pattern that matches the above-described learningpattern, as a result, the values of from the first to i-th features (c₁,c₂, . . . c_(i)) are recorded on the classification table with anindicator of the class (exemplification: q=0) to which this learningpattern belongs. Further, the don't-care sign is recorded for the valuesof the i+1 -th feature vector and later (step S106). Then, the processreturns to step S101, so as to input the next learning pattern.

On the contrary, where at least one learning pattern that matches thislearning pattern exists, the increment variable i is incremented by oneand the process returns to step S105. That is to say, the i-incrementingprocess is continuously performed until it becomes possible to identifyoneself or others by using the values of features up to the i-th featureof input feature patterns.

The above-described processes are repeated until all learning patternsare input.

According to the above-described procedures, the entire learningpatterns may not be classified. For example, learning patterns belongingto different classes may have the same feature vector. In this case, thenumber of learning patterns belonging to each class and that of learningpatterns belonging to the other are counted. Then, the class havinglearning patterns more than those of the other class is determined to bea class indicated by the feature vector.

Of course, other methods can be adopted. For example, patterns whosefeatures c₁ to c_(i) match one another can be grouped (fragmentation)until only one pattern exists in each group. Then, each of the featuresof the i+1-st pattern and later are determined to be the don't-careitem.

FIG. 4 illustrates an example classification table generated accordingto the present invention. Referring to this drawing, a table storingidentification indicator (q) of the class of each learning pattern andfeature vectors (c₁, c₂, . . . c_(n)) is shown. Further, in thisdrawing, sign “*” indicates the don't-care item.

Next, a pattern-judgment method performed by using the above-describedpattern-judgment table will be described with reference to the attacheddrawing.

FIG. 5 is a block diagram illustrating the flow of processing for apattern-judgment method according to the present invention. Referring tothis drawing, pattern input means 201, feature extraction means 202, andpattern judgment means 203 are shown. Further, the feature storage means104 for storing and maintaining the determined feature set used by thefeature extraction means 202 for extracting features and theclassification table 106 that had already been generated, where theclassification table 106 is used by the pattern judgment means 203 forjudging patterns, are shown.

The pattern input means 201 is used for inputting a pattern from apredetermined medium. For example, even though an input pattern is notlimited to character data, drawing data, speech data, and so forth.However, data on images of the face, finger print, retina of a person ordata on the speech or the like of the person may be input, asinformation used for identifying the person, for example.

The feature extraction means 202 extracts the feature of the inputpattern transmitted from the pattern input means 201 by using thedetermined feature set.

The pattern judgment means 203 judges information indicated by the inputpattern based on features obtained by the feature extraction means 202.

The operation of the above-described pattern judgment method will bedescribed, as below. First, the pattern input means 201 receives theinput pattern from the predetermined medium and transmits the inputpattern to the feature extraction means 202.

Then, the feature extraction means 202 calculates the feature vectors(c₁, c₂, . . . c_(n)) according to Equations (2) by using the featureset (determined according to the above-described feature determinationmethod) stored in the feature storage means 104 for the input pattern.Further, the feature extraction means 202 transmits the calculationresult to the pattern judgment means 203.

The pattern judgment means 203 makes a search for a feature vector thatmatches the above-described feature vector by referring to theclassification table 106. Then, the pattern judgment means 203 reads theidentifier of a class recorded corresponding to the feature vector andoutputs it as the judgment result. Where the don't-care sign is recordedon the classification table, the pattern judgment means 203 determinesthe value of a feature corresponding to the sign to be the same as thatof the above-described feature vector, whatever the value may be.

Here, the difference between the present invention and a known method(ID 3, C4.5, and so forth) performed by using a decision tree will bedescribed, so as to clearly define advantages of the present inventionfor determining pattern features and performing pattern judgmentaccording to the above-described procedures.

ID 3 or the like and the present invention share a common method fordetermining a classification rule at each node of a decision treeaccording to the information-maximization criterion, However, accordingto the methods used for ID 3 and C4.5, the classification rule(“feature” is used in the present invention, as a word corresponding tothe classification rule) is determined for each node. For example, wherea second feature is determined after determining a first feature c₁ andwhere the value of the first feature c₁ can be either one or zero, theclassification rule (feature) is differently determined according to thecircumstances. However, in the present invention, the same features aredetermined for nodes of the same depth. That is to say, n-th featuresare the same as one another, which is a significant difference betweenthe present invention and the known methods.

Of course, learning patterns can be entirely classified according toeither way. However, there is a significant difference between thegeneralization ability, that is, the ability to identify an unlearnedpattern of the present invention and that of the known method. Whereboth the trees have the same depth (n), for the sake of simplification,2^(n) features are determined according to the method of ID3 or C4.5.However, n features are determined according to the method of thepresent invention. Therefore, the configuration of the present inventionis simpler than those of the known methods. It should be noted that thedifference between the number of determined features of the presentinvention and that of the known method increases exponentially, as theproblem becomes more difficult and a deeper tree is required.

Incidentally, where classifiers having the same performance for alearning pattern are provided and where one of the classifiers has aconfiguration simpler than those of the others, the generalizationability of the classifier having the simpler configuration is greaterthan those of the other classifiers (“Ockham's razor”). Accordingly, thefeature selection method and the pattern judgment method using the sameaccording to the present invention can significantly increase theperformance, most notably the generalization ability. That is to say,the generalization ability of the present invention becomes greater thanthose of the known method,

Next, a second embodiment according to the present invention for addingan effective noise to a learning pattern will be described.

The system configuration of the second embodiment is substantially thesame as that of the above-described first embodiment (see FIG. 1).Further, the flow of processing of the second embodiment issubstantially the same as that of the above-described first embodiment(see FIGS. 2 to 5). Only the difference between the first and secondembodiments will be described, as below.

According to this embodiment, a noise parameter σ_(n) _(—) s (s=1 to N)is further set in advance, so as to be added to the plenty of (N)feature-parameter sets (k_s, r₀ _(—) s, σ_s, and th_s) used by thefeature-candidate generation means 102. Then, as Is the case with stepS001 of the first embodiment, the feature-candidate generation means 102substitutes the s-th (where s starts from one) feature-parameter set(k_s, r₀ _(—) s, σ_th_s, and σ_(n) _(—) s) into (k, r₀, σ, th, andσ_(n)) and generates a complex Gabor function and a Gaussian functionaccording to Equations (1). Then, the feature-candidate generation means102 transmits the above-described complex Gabor function, Gaussianfunction, threshold parameter th, noise parameter σ_(n), and thefeature-candidate-identification number s to the feature decision means103 (step S002 shown in FIG. 2).

The feature decision means 103 receives learning patterns in sequencefrom the learning-pattern storage means 101 (step S003 shown in FIG. 2)and calculates feature b for each learning pattern according to thefollowing equations by using the above-described complex Gabor functionand Gaussian function (step S004 shown in FIG. 2). Here, the t-thlearning pattern is determined to be f_(t)(r) (t=1 to M).

$\begin{matrix}\left. \begin{matrix}{\alpha = {{{\sum\limits_{r}{{f_{t}(r)}{{Gab}\left( {{r;k},r_{0},\sigma} \right)}}}}^{2}/{\sum\limits_{r}{{f_{t}(r)}^{2}{G\left( {{r;r_{0}},\sigma} \right)}}}}} \\{b = {{Erf}\left( {\left( {a - {th}} \right)/\sigma_{n}} \right)}}\end{matrix} \right\} & (6)\end{matrix}$

Erf(x) shown in Equations (6) indicates an error function. This errorfunction can be replaced by other non-linear function whose value isfrom zero to one inclusive, such as a Sigmoidal function.

Where the feature b is calculated for each learning pattern by using thes-th feature-parameter set in the above-described manner, the featuredecision means 103 then calculates the mutual-information amount MIobtained from the s-th feature candidate in the following manneraccording to the above-described Equation (3) (step S005 shown in FIG.2).

First, where the value of a feature calculated for a predeterminedlearning pattern according to the above-described Equations (6) is b(0≦b≦1), the value of a feature c for the predetermined learning patternis one, where the probability value is b. However, the value of thefeature c for the predetermined learning pattern is zero, where theprobability value is (1−b). M(c) shown in Equations (3) can be replacedby an expected value of the total number of learning patterns whosefeatures are represented by c. Similarly, M(q, c) can be replaced by anexpected value of the total number of learning patterns that havefeatures represented by c and that belong to class q. Subsequently, themutual-information amount MI is calculated,

Where the mutual-information amount is calculated for each of the entirefeature candidates, the feature decision means 103 performs a comparisonbetween the mutual-information amounts obtained from the featurecandidates, as in the first embodiment. Then, the feature decision means103 determines a feature candidate from which a largestmutual-information amount can be obtained to be a first feature of afeature set to be determined (step S006 shown in FIG. 2).

Subsequently, where the m-th feature is determined, the (m+1)-th featurec_(m+1) is determined under the condition that features up to the m-thfeature (c₁, c₂, . . . c_(m)) are known so that the information amountMI_(m+1) obtained from the feature c has a maximum value. However, as inthe case where the above-described first feature is determined, varioustotal numbers of learning patterns are replaced by expected values ofvarious total number of learning patterns corresponding thereto forcalculating the mutual-information amount. For calculating MI₂, forexample, M(c, c₁) is replaced by an “expected value of the total numberof learning patterns that have a feature represented by c, a firstfeature represented by c₁. Further, M(q, c, c₁) is replaced by an“expected value of the total number of learning patterns that have afeature represented by c and a first feature represented by c₁ and thatbelong to class q”.

As in the first embodiment, the above-described procedures arecontinuously performed until the amount of obtained information (theamount of appended information) becomes smaller than a preparedthreshold value even though a new feature is selected. Once thediscontinue requirements are fulfilled, each of parameters of thedetermined feature set is stored in the feature storage means 104 (stepS007 shown in FIG. 2).

Next, procedures for generating another classification table accordingto the embodiment will be described with reference to the attacheddrawing, FIG. 6 is a flowchart showing an example process for generatingthe classification table according to the embodiment. Here, n featuresare determined by the feature decision means and the classificationtable is Initialized (cleared).

Referring to FIG. 6, first, the classification-table generation means105 calculates features b_(s) (b₁, b₂, . . . b_(n)) according toEquations (6) by using each of parameters of the determined feature setfor the learning patterns (steps S201 and S202). Where the features b,are calculated for the entire learning patterns, theclassification-table generation means 105 determines the probabilitythat the value of the s-th feature c_(s) becomes one to be probabilityb_(s) and the probability that the value of the s-th feature c_(s)becomes zero to be probability (1−b_(s)). Further, theclassification-table generation means 105 initializes the value of j toone and starts selecting patterns to be recorded on the classificationtable 106 (step S203).

First, the classification-table generation means 105 generates allpossible combinations of from the first to j-th features (c₁, c₂, . . .c_(j)) (step S204). Next, the classification-table generation means 105verifies the entire combinations of from the first to j-th featuresagainst feature vectors that were recorded on the classification table106 in sequence (step S205) and deletes any combination that matches thefeature vector recorded on the classification table 106 of the entirecombinations of from the first to J-th features (step S205-1). In thiscase, any feature for which the don't-care sign is recorded on theclassification table 106 is handled as a feature that matches therecorded feature vector, whatever the feature value may be.

Then, the classification-table generation means 105 calculates expectedvalues of the other feature patterns (c₁, c₂, . . . c_(j)) by using thefeatures b_(s) calculated for the learning patterns, at step S202, andmakes a search for a feature pattern that matches a condition to berecorded on the classification table 106 (step S206). More specifically,the classification-table generation means 105 selects suitable featurepatterns (C₁, c₂, . . . c_(j)) in sequence from among the other featurepatterns and determines a predetermined feature pattern to be a patternrepresenting the class q, where the predetermined feature patternfulfills the following requirements (1) and (2). According torequirement (1), the predetermined feature pattern is one of theabove-described feature patterns, where an expected value of the totalnumber of learning patterns belonging to a predetermined class q isequivalent to or larger than a predetermined threshold value. Further,according to requirement (2), the predetermined feature pattern is oneof the above-described feature patterns, where the entire expectedvalues of the total number of learning patterns belonging to the otherclasses are equivalent to or smaller than another predeterminedthreshold value. Then, the feature pattern is determined to be a patternrepresenting the class q and recorded on the classification table 106,with a mark q. At this time, the don't-care sign is recorded on each ofcells of from the j+1-th to n-th features (step S207).

Where no feature patterns that fulfill the requirements are obtainedafter the above-described search is made, an equation j=j+1 holds andthe process returns to step S204. Then, the classification-tablegeneration means 105 regenerates all possible combinations of from thefirst to j-th features (c₁, c₂, . . . c_(j)). Where a search is runacross the entire combinations up to the n-th feature and finished(j=n), the process is terminated.

Next, procedures for another pattern judgment according to theembodiment will be described with reference to FIG. 5 once again.

First, the feature extraction means 202 calculates feature vectors (b₁,b₂, . . . b_(n)) according to Equations (6) by using the feature setstored in the feature storage means 104 for input patterns input fromthe pattern input means 201. Then, the feature extraction means 202transmits the calculated feature vectors to the pattern judgment means203. At this time, the classification-table generation means 105determines the value of probability that the s-th feature c_(s) (s=1 ton) becomes one to be probability b, and the value of a probability thatthe s-th feature c_(s) becomes zero to be probability (1−b_(s)).

The pattern judgment means 203 calculates the probability that the inputpattern belongs to each of the classes with reference to theclassification table 106. More specifically, this process has thefollowing steps.

For example, where the probability that the input pattern belongs toclass q=0 is calculated, first, the pattern judgment means 203 reads allfeature patterns (c₁, c₂, . . . C_(n)) for which marks q=0 are recordedfrom the classification table 106, Here, where a first pattern of theabove-described feature patterns corresponds to an equation (c₁, c₂, . .. cn)=(1, 0, *, *, . . . , *) (where the symbol “*” indicates thedon't-care sign). Since the value of a probability that the features C₁and C₂ correspond to equations c₁=1 and c₂=0 for the input pattern isdetermined to be b1·(1−b₂), the value of a probability that the featurevector of the input pattern matches the feature pattern is alsodetermined to be b1·(1−b₂). Since features of the third feature patternand later are indicated by the don't-care sign, the values of thefeatures of the third feature and later do not affect the entireprobability.

As has been described, the pattern judgment means 203 calculatesprobabilities that the feature vector of an input pattern agrees witheach feature pattern corresponding to equation q=0 by using theprobabilities b_(s) and (1−b_(s)) calculated from the input pattern andobtains the total sum thereof. The total sum of the probabilitiesindicates the probability that the input pattern belongs to class q=0.The pattern judgment means 203 performs a comparison between thecalculated probabilities that the input pattern belongs to each of theclasses and outputs a class that provides a maximum probability, as ajudgment result.

Further, the pattern may be compared to a predetermined threshold valueand an instruction to reject (indicating that judgment is impossible)may be output according to the comparison result, as required.

Further, for reducing the processing time, only the classification tablemay be generated according to the method described in the embodiment andthe pattern judgment process may be performed according to the methoddescribed in the first embodiment In this case, a feature vectorcorresponding to an input pattern is calculated according to Equations(2).

According to the above-described second embodiment of the presentinvention, the effective noises are added to the learning patterns.Therefore, a feature set with higher margins can be selected,Subsequently, the generalization ability (the ability to identify apattern other than the learning patterns) of the second embodiment ishigher than that of the first embodiment

Next, a third embodiment of the present invention will be described withreference to the attached drawings. In this embodiment, the number ofclasses can be easily increased to at least three, as is the case withthe above-described first and second embodiments. However, the number ofclasses is determined to be two (q=0 or 1) for the sake of description.

FIG. 7 is a block diagram illustrating a system configuration of thefeature selection method according to the third embodiment of thepresent invention. Referring to this drawing, learning-pattern storagemeans 301 feature-candidate generation means 302, feature decision means303, feature storage means 304, and a transition table 305 are shown.The description of the same parts as those of the above-describedembodiments will be omitted.

The feature decision means 303 is provided for deciding a feature setsuitable for identifying a pattern from among feature candidatesgenerated by the feature-candidate generation means. The featuredecision means 303 generates the transition table 305 recordingparameters obtained during feature-decision procedures of thisembodiment.

The transition table 305 is a table recording parameters used forperforming a pattern-judgment process that will be described later.

Next, a description of the parameters or the like will be presentedbefore describing the procedures.

The number of provided learning patterns is determined to be M. Further,L sets D_(i) (i=1 to L) and a plurality of sets D′_(i) (i=1 to L) thatpair off therewith are provided. Here, the sign L indicates apredetermined natural number and an equation L=64 is adopted accordingto this embodiment.

Next, procedures for the feature selection method of the embodiment willbe described with reference to the attached drawings. Procedures fordetermining the first feature are entirely the same as those of theabove-described second embodiment.

FIGS. 8, 9, and 10 show flowcharts illustrating the flow of processingfor the feature selection method of the present invention. Referring toFIG. 8, where the first feature is determined (step S301), the featuredecision means 303 initializes each of the sets D_(i) and D′_(i) (each iis equivalent to 1 to L) (step S302). Since an equation L=64 holds, setsD₁ to D₆₄ and sets D′₁ to D′₆₄ are initialized (cleared, so as to benull sets).

Regarding a feature-order parameter m, an equation m=2 holds.Subsequently, the procedures are started from a second feature.

Then, the feature decision means 303 calculates the feature b for eachof the learning patterns according Equations (6) by using the determinedfirst-feature parameter (steps S303 to S305).

Next, the feature decision means 303 distributes the learning patternsinto set E₁, where the weight value is determined to be b. Further, thefeature decision means 303 distributes the learning patterns into setE₀, where the weight value is determined to be (1−b) (step S306).

Next, the feature decision means 303 calculates P(q=1|E₁) and P(q=1|E₀)according to the following equations.

$\begin{matrix}\left. \begin{matrix}{P\left( {q = {{1\left. E_{1} \right)} = {{M\left( {{q = 1},E_{1}} \right)}/{M\left( E_{1} \right)}}}} \right.} \\{{P\left( {q - 1} \middle| E_{0} \right)} = {{M\left( {{q = 1},E_{0}} \right)}/{M\left( E_{0} \right)}}}\end{matrix} \right\} & (7)\end{matrix}$

Further, M(E₀) indicates the total of weights of learning patternsbelonging to set E₀ and M(E₁) indicates the total of weights of learningpatterns belonging to set E₁. M(q=1|E₀) indicates the total of weightsof learning patterns belonging to class q=1 and set E₀. Further,M(q=1|E₁) indicates the total of weights of learning patterns belongingto class q1 and set E₁.

Next, the feature decision means 303 copies the details of thedistributed sets E₀ and E₁ to sets D_(j) and D_(j′) determined fromamong a predetermined plurality of sets (D_(i)), respectively (stepS307).

$\begin{matrix}\left. \begin{matrix}{{{d\_}\left( {j - 1} \right)} < {P\left( {q = \left. 1 \middle| E_{0} \right.} \right)} \leq {d\_ j}} \\{{{d\_}\left( {j^{\prime} - 1} \right)} < {P\left( {q = \left. 1 \middle| E_{1} \right.} \right)} \leq {d\_ j}^{\prime}}\end{matrix} \right\} & (8)\end{matrix}$

Here, d_j is a predetermined constant shown in the following equations(j=1 to L−1). According to the following equations, the value ofconstant d_j is between or equal to zero and one and increases, so as toresemble a letter S in shape according to the value of j. For example,where an equation P(q=1|E₀)=0.15 holds, an equation j=30 satisfying theupper equation of Equations (8) is determined according to equationsd_(—)29=0.111 . . . , d_(—)30=0.2, and E₀ is copied to D₃₀. Similarly,where an equation P(q=1|E₁)=0.7 holds, an equation j′=34 satisfying thelower equation of Equations (8) is determined according to equationsd_(—)33=0.666 . . . , d_(—)34=0.8, and E₁ is copied to D₃₄.0=d_(—)0<d_(—)1<d_(—)2< . . . <d_L=1d _(—) j=2^(j−32)/(1+2^(j−32))

Of course, the above-described equations are determined so that theycorrespond to the number of sets D_(i) and D′_(i) and easily handled inthis embodiment. Therefore, the equations are not limited to theabove-described embodiment.

Here, the feature decision means 303 records (1, j, j′) on thetransition table 305 (step S307). For example, where equations j=30 andj=34 hold, (1 30, 34) is recorded on the transition table (see FIG. 11),so as to be used for pattern judgment that will be performed later.Where the first feature is represented by an equation c₁=1, the firstfeature is caused to transition to the set D_(j), and where the firstfeature is represented by an equation c₁=0, the first feature is causedto transition to the set D_(j′).

Next, referring to FIG. 9, first, the feature decision means, 303calculates the feature b for each learning pattern according toEquations (6) by using the s-th (where s=1 to N and s starts from one)(steps S309 to S312). At this time, as in the above-described secondembodiment, the value of a feature c is determined to be one withprobability b and determined to be zero with probability (1−b).

Next, the feature decision means 303 calculates information amount MI′obtained from the s-th feature candidate according to the followingequations (step S313).

$\begin{matrix}\begin{matrix}{{MI}^{\prime} = {H_{1} - \left\langle H_{2} \right\rangle_{c}}} \\{where} \\{{H_{1} = {- {\sum\limits_{q,i}{{P\left( q \middle| D_{i} \right)}\log\;{P\left( q \middle| D_{i} \right)}}}}},} \\{{P\left( q \middle| D_{i} \right)} = {{M\left( {q,D_{i}} \right)}/{M\left( D_{i} \right)}}} \\{{H_{2} = {- {\sum\limits_{q,i}{{P\left( {\left. q \middle| c \right.,D_{i}} \right)}\log\;{P\left( {\left. q \middle| c \right.,D_{i}} \right)}}}}},} \\{{P\left( {\left. q \middle| c \right.,D_{i}} \right)} = {{M\left( {q,c,D_{i}} \right)}/{M\left( {c,D_{i}} \right)}}}\end{matrix} & (9)\end{matrix}$

Here, M(D_(i)) indicates the sum of weights of learning patterns in theset D_(i) and M(q, D_(i)) is the sum of weights of learning patternsthat are in the set D_(i) and that belong to class q. However, the sumrelating to i is calculated only when the value of M(D_(i)) is otherthan zero. Further, M(c, D_(i)) is an expected value of the sum ofweights of learning patterns that have features represented by c andthat are in the set D_(i). M(q, c, D_(i)) is an expected value of thesum of weights of learning patterns that have features represented by cand are in the set D_(i), and that belong to class q.

In the above-described manner, the feature b is calculated for eachlearning pattern until an equation s=N holds. Where the informationamount MI′ is calculated for the entire feature candidates, the featuredecision means 303 performs a comparison therebetween and determines afeature candidate having a largest information amount to be the m-thfeature of a feature set to be determined (step S314).

After the m-th feature having the largest information amount isdetermined in the above-described manner, the feature decision means 303performs the following operation for the entire sets D_(i) (D₁ to D₆₄).

Referring to FIG. 10, the feature decision means 303 calculates thefeature b for each learning pattern according to Equations (6) by usingthe determined m-th feature parameter and distributes each of learningpatterns belonging to the set D_(i) to the sets E₁ and E₀ in a weightb:(1−b) ratio (steps S315 to S318, see FIG. 13).

Next, the feature decision means 303 calculates P(q=1|E₁) and P(q=1|E₀)according to Equations (7).

Subsequently, the feature decision means 303 adds the contents of thesets E₀ and E₁ after the distribution from the plurality of sets(D′_(i)) that had been prepared to the sets D′_(j) and D′_(j1) that aredetermined according to Equations (8), respectively (step S319). Forexample, where an equation P(q=1|E₀)=0.05 holds, as shown in FIG. 13, anequation j=28 satisfying the upper equation of Equations (8) isdetermined and E₀ is added to the set D′₂₈. Similarly, where an equationP(q=1|E₁)=0.3 holds, an equation j′=31 satisfying the lower equation ofEquations (8) is determined and E₁ is added to the set D′₃₁ (see FIG.13).

Here, the feature decision means 303 records (m, ij, j′) on thetransition table 305 (step S320). For example, where equations m=2,i=30, j=28, and j′=31 hold, (j=28 and j′=31) are recorded at positions(m=2 and i=30) corresponding thereto. The details of recorded dataindicates that the m-th feature is caused to transition to the setD_(j), where the value of m-th feature is represented by an equationc_(m)=1, and that the m-th feature is caused to transition to the setD_(j′) where the value of m-th feature is represented by an equationc_(m)=0. The details of recorded data are used for performing patternjudgment that will be described later.

Where the above-described operations are finished for the entire setsD_(i) (i=1 to L), the feature decision means 303 copies the set D′_(i)(i=1 to L) to the set D_(i) (i=1 to L) and initializes (clears) the setD′_(i) (i=1 to L) for determining a subsequent feature, that is, them+1-st feature (step S321, see FIG. 13).

The above-described procedures are repeated until the amount ofinformation (the amount of added information) obtained at step S314becomes smaller than a threshold value MI_th that had been set, eventhough a new feature is selected. Therefore, unless the above-describeddiscontinue requirements are fulfilled, the feature decision means 303returns to step S309 for determining the next m+1-st feature (stepS322), where an equation m=m+1 holds.

On the other hand, where the discontinue requirements are fulfilled, theparameters of the determined feature set are stored in the featurestorage means 304 (step S323).

Here, a transition table according to the third embodiment of thepresent invention will be described, where the transition table isgenerated for performing a judgment process. FIG. 11 shows an exampletransition table. Referring to this drawing, the transition tableincludes a first part illustrating sets of transition destinations forthe first feature. The transition table further includes a second partthat illustrates transition destinations for the second feature andlater. The second part is indicated by feature-order parameters m,set-number parameters i, and feature values Cm.

Further, referring to the first part of the transition table shown inthis drawing, “30” is written in a cell corresponding to equation c₁=1and “34” is written in a cell corresponding to equation c₁=0, whichindicates that “where the first feature is represented by equation c₁=1,the first feature is caused to transition to the set D₃₀ and where thefirst feature is represented by equation c₁=0, the first feature iscaused to transition to the set D₃₄”. Further, referring to cellscorresponding to equations m=2 and i=30 of the second part of thetransition table shown in this drawing, “28” is written in a cellcorresponding to equation c₁=1 and “31” is written in a cellcorresponding to equation c₁=0, which indicates that “where the secondfeature is represented by equation c₂=1, a pattern belonging to the setD₃₀ is caused to transition to the set D₂₈ and where the second featureis represented by equation c₂=0, the pattern is caused to transition tothe set D₃₁”.

Each of signs “−” shown in cells of the second part of this drawingindicates that the cells are blank. For example, in a row (m=2) showingtransition destinations based on the value of the second feature c₂,each of cells other than cells on columns corresponding to the sets D₃₀and D₂₄ (i=30, 34) is the black cell “_”. This configuration correspondsto the fact that sets of transition destinations corresponding the firstfeature is represented by “30” or “34”. That is to say, according to thetransition table shown in FIG. 11, an input pattern is caused totransition to the set D₃₀ or the set D₃₄ according to the first feature,which eliminates the need for referring to other cells.

Further, according to the transition table shown in FIG. 11, the partillustrating the sets of transition destinations for the first featureand the part illustrating the sets of transition destinations for thesecond feature and later are separately shown. However, theconfiguration is not limited to the above-described embodiment, so longas the sets of transition destinations are illustrated by thefeature-order parameters m, set-number parameters i, and feature valuesCm. For example, the first and second parts shown in this drawing may becombined with each other, as shown in FIG. 12.

Next, procedures for performing pattern judgment in this embodiment willnow be described with reference to the attached drawing. FIG. 14 is ablock diagram showing a pattern judgment method used for judging apattern by using the above-described feature sets. Referring to thisdrawing, pattern input means 401, feature extraction means 402, andpattern judgment means 403 are shown. Further, feature storage means 304used by the feature extraction means 402, so as to extract a feature anda transition table 305 used by the pattern judgment means 403, so as tojudge a pattern are shown.

The operation of the pattern judgment will now be described. First, thepattern input means 401 receives an input pattern from a predeterminedmedium and transmits the input pattern to the feature extraction means402.

Next, the feature extraction means 402 calculates a feature vector (b₁,b₂, . . . b_(n)) for the input pattern according to Equations (6) byusing the feature set (determined according to the above-describedfeature decision method) stored in the feature storage mean 304.Further, the feature extraction means 401 transmits the calculationresult to the pattern judgment means 403.

At this time, the probability that the value of feature c_(s) (s=1 to n)becomes one is determined to be probability b_(s), and the probabilitythat the value of feature c_(s) becomes zero is determined to beprobability (1−b_(s)). The pattern judgment means 403 calculates theprobability that the input pattern belongs to each of classes byreferring to the transition table 305 in sequence.

First, the pattern judgment means 403 reads a transition rule (1, j, j′)on the basis of the first feature, and causes the state to transition tostate j with probability b₁. Further, the pattern judgment means 403causes the state to transition to state j′ with probability (1−b₁).Then, the probability that the value of the second feature c₂ becomesone is determined to be probability b₂ and the probability that thevalue of the second feature c₂ becomes zero is determined to beprobability (1−b₂), Then, the pattern judgment means 403 reads atransition rule on the basis of the second feature and further causesthe state to transition to another. For example, where (2, j, k, k′) and(2, J′, k″, k′″) are written in the transition rule on the basis of thesecond feature, state j is caused to transition to state k withprobability b₂ and state k′ with probability (1−b₂). Further, state j′is caused to transition to state k″ with probability b₂ and state k′″with probability (1−b₂). In this case, therefore, the value ofprobability of remaining in state k is determined to be b₁·b₂, the valueof probability of remaining in state k′ is determined to be b₁·(1−b₂),the value of probability of remaining in state k″ is determined to be(1−b₁)·b₂, and the value of probability of remaining in state k′″ isdetermined to be (1−b₁)·(1−b₂), respectively. In this manner, thepattern judgment means 403 causes the state to transition with referenceto the transition table by using features up to the n-th feature.

Where the state is caused to transition by using features up to the n-thfeature in the above-described manner, the value of probability ofremaining in each of states j is calculated, as P(j)(j=1 to L). At thistime, probability p (q=1) that the input pattern belongs to class q=1can be obtained according to the following equations.

$\begin{matrix}{{P\left( {q = 1} \right)} = {\sum\limits_{{j = 1},L}{{P(j)}{P\left( {q = \left. 1 \middle| j \right.} \right)}}}} & (10) \\{where} & \; \\{{P\left( {q = \left. 1 \middle| j \right.} \right)} = {\left( {{d\_ j} + {{d\_}\left( {j - 1} \right)}} \right)/2}} & (11)\end{matrix}$

Where the above-described probability P(q=1) is larger than apredetermined threshold value, the pattern judgment means 403 determinesthat “the input pattern belongs to class q=1” and outputs thedetermination result. Further, where the probability P(q=1) is smallerthan the predetermined threshold value, the pattern judgment means 403rejects the probability, or determined that judgment cannot be performedand outputs the determination result. Of course, this configuration canbe modified in various ways. For example, two values including anadoption critical value and a rejection critical value may be provided,as the threshold value. Further, it may be determined that the patternjudgment cannot be performed where the probability value is withinboundaries.

Further, according to this embodiment, the value of P(q=1|j) isdetermined according to Equation (11). However, the value can bedetermined in the following manner by using learning patterns. That isto say, learning patterns f_(i)(r) (i=1 to M) are caused to transitionaccording to the transition table and probability P(i, j) that each ofthe learning patterns stays in state j in the end is calculated based onfeature vectors (b₁, b₂, . . . b_(n)) of the learning patterns. Next,the sum of values of P(i, j) for only learning patterns belonging toclass q=1 is determined to be P1(j) and the sum of values of P(i, j) ofthe entire learning patterns is determined to be Ptotal(j). Then, thevalue of P(q=1|j) is determined based on equationP(q=1|j)=P1(j)/Ptotal(j).

As a modification for reducing the processing time required forperforming the pattern judgment of this embodiment, the following methodcan be adopted. That is to say, only the transition table 305 isgenerated according to the method of this embodiment and the featurevectors for the input patterns are calculated according to Equations(2). In this case, the pattern judgment is performed in the followingmanner. That is to say, probabilistic operations are not performedduring the judgment process and the state is caused to transitiondeterminately according to the feature vectors obtained by Equations(2). Further, in this case, where the state is caused to transitionaccording to the transition table, the state is caused to transitiondeterminately in each stage. Therefore, where the state is caused totransition by using features up to the n-th feature, the state isfinally fixed to a predetermined state of from 1 to L, that is, state j.Then, the value of P(q=1|j) shown in Equation (11) corresponding to theabove-described state j is examined. Where the value is larger than apredetermined threshold value, it is determined that “the input belongsto class q=1” and the determination result is output.

In the above-described third embodiment of the present invention, noisesare effectively added to learning patterns, as is the case with thesecond embodiment. Therefore, a feature set with a higher margin can beselected.

Further, according to the third embodiment of the present invention, thecalculations performed for selecting features can be significantlyreduced compared to those of the second embodiment. This because thecalculations for selecting features can be achieved by calculating theamount of information obtained where a predetermined number of sets isdivided according to the features thereof (equations (9)) even thoughthe number n of selected features increases, where the number of sets isdetermined to be L, at most.

Further, according to the third embodiment of the present invention, theinformation amount is calculated by merging once-separated learningpatterns into the set D_(i), as required. Therefore, the number oflearning patterns belonging to each set D_(i) is prevented from beingdecreased significantly. As a result, the occurrence of a phenomenonwhere the features are selected according to the learning patterns isreduced and the generalization ability further increases.

As has been described, according to the present invention, highlysophisticated pattern identification can be performed without requiringenormous learning. This is because the present invention provides amethod for uniquely selecting features later than a predeterminedfeature without depending on the value of the predetermined feature.

Further, according to the present invention, the identification ability(generalization ability) significantly increases. This is because thepresent invention provides a configuration for significantly simplifyingthe structure of a classifier.

INDUSTRIAL APPLICABILITY

The present invention is suitable for selecting, classifying, andjudging the features of patterns used for image identification or thelike.

1. A feature selection method used for a system includinglearning-pattern storage for storing and maintaining a learning patternincluding class information, feature-candidate generation for generatinga plurality of feature candidates, and feature decision for deciding afeature set suitable for pattern identification from among the pluralityof feature candidates generated by the feature-candidate generation,wherein the feature decision, utilizing a computer-readable memory,prepares a predetermined number of sets for causing the learningpatterns to transition according to feature values, decides apredetermined feature candidate having a largest amount of mutualinformation from the class information of the learning patterns, as afirst feature of the feature set by calculating the feature value ofeach of the learning patterns corresponding to the feature candidates,adds weights to each of the learning patterns according to the decidedfeature, distributes the learning patterns, and causes the learningpatterns to transition to a set corresponding to the determined featurein sequence, and decides predetermined feature candidates having a nextlargest amount of mutual information between the feature value of thelearning pattern corresponding to the feature candidate and the classinformation of the learning pattern in sequence, as next features of thefeature set, under the condition that information about the setincluding the learning patterns and the decided feature are known. 2.The feature selection method according to claim 1, wherein the featurecandidates generated by the feature-candidate generation include one ofthe feature candidates using a complex Gabor function, as a featureextraction function.
 3. The feature selection method according to claim1, wherein the feature candidates generated by the feature-candidategeneration includes a feature candidate obtained from a featureextraction function obtained by normalizing a complex Gabor function. 4.The feature selection method according to claim 1, wherein the featuredecision calculates a probability that a feature of each of the learningpatterns has a predetermined value, as the feature value of each of thelearning patterns corresponding to the feature candidates.
 5. A methodfor classifying patterns used for a system including learning-patternstorage for storing and maintaining the learning patterns used forlearning, feature-candidate generation for generating a plurality offeature candidates, feature decision for deciding a feature set suitablefor pattern identification from among the plurality of featurecandidates generated by the feature-candidate generation, featurestorage for storing and maintaining the feature set decided by thefeature decision, and classification-table generation for generating aclassification table, wherein the classification-table generationcalculates the feature value of each of the learning patterns by usingthe feature set decided according to the feature selection method inclaim 1, and classifies the learning patterns according to theclassification table including the feature values of the learningpatterns and class information.
 6. The method for classifying learningpatterns according to claim 5, wherein where the learning patterns canbe classified irrespective of the feature values, theclassification-table generation provides a redundant item in place ofthe feature value in the classification table at a positioncorresponding thereto.
 7. The method for pattern judgment used for thesystem including pattern input for inputting patterns, a featureextraction for extracting features from the patterns, pattern judgmentfor judging the patterns based on the features, and feature storage forstoring and maintaining the decided feature set, wherein the featureextraction calculates the feature value of each of the input patterns byusing the feature set decided by the feature selection method accordingto claim 1, and performs the pattern judgment based on the calculatedfeature values.
 8. The method for pattern judgment according to claim 7wherein the pattern judgment performs the pattern judgment by using theclassification table obtained by a method for classifying patterns ofstoring and maintaining the learning patterns used for learning,feature-candidate generation for generating a plurality of featurecandidates, feature decision for deciding a feature set suitable forpattern identification from among the plurality of feature candidatesgenerated by the feature-candidate generation, feature storage forstoring and maintaining the feature set decided by the feature decision.9. The method for pattern judgment according to claim 8, wherein each ofthe feature values of the input patterns indicates a value of aprobability that a feature of a concerned rank becomes a predeterminedvalue, wherein the pattern judgment performs judgment by calculating aprobability that each feature pattern included in the classificationtable becomes a value of a predetermined class information by using thefeature values.
 10. A method for pattern judgment used for a systemincluding pattern input for inputting patterns, feature extraction forextracting features from the patterns, pattern judgment for judging thepatterns based on the features, and feature storage for storing andmaintaining a decided feature set, wherein the feature extractioncalculates a feature value of each of the input patterns by using thefeature set decided according to the feature selection method accordingto claim 1, wherein the pattern judgment causes the input patterns totransition based on the feature values of the input patterns and atransition table that stores sets to which the learning patterns belongat the time where each feature of the feature set obtained by thefeature selection method according to claim 1 is decided in sequence,and wherein the pattern judgment is performed based on the sets to whichthe input patterns belong, as a result of a transition.
 11. A method forpattern judgment used for a system including pattern input for inputtingpatterns, feature extraction for extracting features from the patterns,pattern judgment for judging the patterns based on the features, andfeature storage for storing and maintaining a decided feature set,wherein the feature extraction calculates a feature probability for eachof the input patterns, the feature probability indicating a probabilitythat the feature of a concerned rank becomes a predetermined value byusing the feature set determined by the feature selection methodaccording to claim 1, wherein the pattern judgment causes the inputpatterns to transition based on the feature probabilities of the inputpatterns and a transition table that stores sets to which learningpatterns belong at the time where each feature is decided in sequenceaccording to the feature selection method according to claim 1, andperforms the pattern judgment by calculating a probability that each ofthe input patterns has predetermined class information based on a routeof the transition.
 12. A program on a computer readable medium run by acomputer forming a system including learning-pattern storage for storingon a computer-readable medium and maintaining a learning patternincluding class information, feature-candidate generation for generatinga plurality of feature candidates, and feature decision for deciding aset of features suitable for pattern identification from among theplurality of feature candidates generated by the feature-candidategeneration, wherein the program performs feature selection by making thecomputer perform the steps of deciding a predetermined feature candidatehaving a largest amount of mutual information from the class informationof the set of the learning patterns, as a first feature of the featureset, by calculating a feature value of each of the learning patternscorresponding to the feature candidates; deciding predetermined featurecandidates having a largest amount of mutual information between thefeature value of the learning pattern corresponding to the featurecandidate and the class information of the learning pattern in sequence,as next features of the feature set, under the condition that thedecided features are known; and wherein the computer calculates aprobability that a feature of each of the learning patterns has apredetermined value, as the feature value of each of the learningpatterns corresponding to the feature candidates.
 13. A program on acomputer-readable medium run by a computer forming a system includinglearning-pattern storage for storing and maintaining a learning patternincluding class information, feature-candidate generation for generatinga plurality of feature candidates, and feature decision for deciding afeature set suitable for pattern identification from among the pluralityof feature candidates generated by the feature-candidate generation,wherein the program performs feature selection by making the computerperform the steps of: preparing a predetermined number of sets forcausing the learning patterns to transition according to a plurality offeature values; deciding a predetermined feature candidate having alargest amount of mutual information from class information of the setof the learning patterns, as a first feature of the feature set, bycalculating the feature value of each of the learning patternscorresponding to the feature candidates, adding a weight to each of thelearning patterns according to the decided feature, distributing thelearning patterns, and causing the learning patterns to transition to aset corresponding to the decided feature in sequence, and decidingpredetermined feature candidates having a largest amount of mutualinformation between the feature value of the learning patterncorresponding to the feature candidate and the class information of thelearning pattern in sequence, as next features of the feature set, underthe condition that information about the set including the learningpatterns and the decided features are known.
 14. The program accordingto claim 12, wherein the feature candidates include a feature candidateusing a complex Gabor function, as a feature extraction function. 15.The program according to claim 12, wherein the feature candidatesgenerated by the feature-candidate generation include a featurecandidate obtained from a feature extraction function obtained bynormalizing a complex Gabor function.
 16. The program according to claim12, wherein the program further makes the computer perform an operationfor each of the learning patterns by using a noise parameter determinedfor each of the feature candidates.
 17. A program run by a computerforming a system including learning-pattern storage for storing andmaintaining learning patterns used for learning, feature-candidategeneration for generating a plurality of feature candidates, featuredecision for deciding a set of features suitable for patternidentification from among the plurality of feature candidates generatedby the feature-candidate generation, feature storage for storing andmaintaining the feature set decided by the feature decision, andclassification-table generation for generating a classification table,wherein the program performs learning-pattern classification by makingthe computer perform the steps of: calculating a feature of each of thelearning patterns by using the feature set decided by running theprogram according to claim 12; and classifying the learning patternsaccording to the classification table including the feature values ofthe learning patterns and class information.
 18. The program accordingto claim 17, wherein where the learning patterns can be classifiedirrespective of the feature values, a redundant item is provided inplace of the feature value in the classification table at a positioncorresponding thereto.
 19. A program run by a computer forming a systemincluding pattern input for inputting patterns, feature extraction forextracting features from the patterns, pattern judgment for judging thepatterns based on the features, and feature storage for storing andmaintaining a decided feature set wherein the program makes the computerperform a step of calculating the feature value of each of the inputpatterns by using the feature set decided by running the programaccording to claim 12, whereby pattern judgment is performed based onthe performance result.
 20. The program according to claim 19 whereinthe computer performs the pattern judgment by using the classificationtable obtained by storing and maintaining learning patterns used forlearning, feature-candidate generation for generating a plurality offeature candidates, feature decision for deciding a set of featuressuitable for pattern identification from among the plurality of featurecandidates generated by the feature-candidate generation, featurestorage for storing and maintaining the feature set decided by thefeature decision.
 21. The program according to claim 20, wherein each ofthe feature values of the input patterns calculated by the computerindicates a value of a probability that a feature of a concerned rankbecomes a predetermined value, wherein the computer performs judgment bycalculating a probability that each feature pattern included in theclassification table becomes a value of predetermined class informationby using the feature values.
 22. A program on a computer-readable mediumrun by a computer forming a system including pattern input for inputtingpatterns, feature extraction for extracting features from the patterns,pattern judgment for judging the patterns based on the features, andfeature storage for storing and maintaining a decided feature set,wherein the program judges the input patterns based on a set to whichthe input patterns belong by making the computer perform the steps of:calculating the feature value of each of the input patterns by using thefeature set decided by running the program which performs featureselection by making the computer perform the steps of: deciding apredetermined feature candidate having a largest amount of mutualinformation from the class information of the set of the learningpatterns, as a first feature of the feature set, by calculating afeature value of each of the learning patterns corresponding to thefeature candidates; deciding predetermined feature candidates having alargest amount of mutual information between the feature value of thelearning pattern corresponding to the feature candidate and the classinformation of the learning pattern in sequence, as next features of thefeature set, under the condition that the decided features are known;and wherein the computer calculates a probability that a feature of eachof the learning patterns has a predetermined value, as the feature valueof each of the learning patterns corresponding to the featurecandidates; and causing the input patterns to transition based on thefeature values of the input patterns and a transition table that storessets to which the learning patterns belong at the time where eachfeature is decided in sequence by running the program which performsfeature selection by making the computer perform the steps of: preparinga predetermined number of sets for causing the learning patterns totransition according to a plurality of feature values; deciding apredetermined feature candidate having a lamest amount of mutualinformation from class information of the set of die learning patterns,as a first feature of the feature set, by calculating the feature valueof each of the learning patterns corresponding to the featurecandidates, adding a weight to each of the learning patterns accordingto the decided feature, distributing the learning patterns, and causingthe learning patterns to transition to a set corresponding to the decidefeature in sequence, and deciding predetermined feature candidateshaving a largest amount of mutual information between the feature valueof the learning pattern corresponding to the feature candidate and theclass information of the learning pattern in sequence, as next featuresof the feature set, under the condition that information about the setincluding the learning patterns and the decided features are known. 23.A program on a computer-readable medium run by a computer forming asystem including pattern input for inputting patterns, featureextraction for extracting features from the patterns, pattern judgmentfor judging the patterns based on the features, and feature storage forstoring and maintaining a decided feature set, wherein the programjudges the input patterns based on a set to which the input patternsbelong by making the computer perform the steps of: wherein the programmakes the computer perform the steps of: calculating a featureprobability for each of the input patterns, the feature probabilityindicating a probability that the feature of a concerned rank becomes apredetermined value by using the feature set decided by running theprogram which performs feature selection by making the computer performthe steps of: deciding a predetermined feature candidate having alargest amount of mutual information from the class information of theset of the learning patterns, as a first feature of the feature set, bycalculating a feature value of each of the learning patternscorresponding to the feature candidates; deciding predetermined featurecandidates having a largest amount of mutual information between thefeature value of the learning pattern corresponding to the featurecandidate and the class information of the Learning pattern in sequence,as next features of the feature set, under the condition that thedecided features are known; and wherein the computer calculates aprobability that a feature of each of the learning patterns has apredetermined value, as the feature value of each of the learningpatterns corresponding to the feature candidates; causing the inputpatterns to transition based on the feature probabilities of the inputpatterns and a transition table that stores sets to which learningpatterns belong at the time where each feature is decided in sequence byrunning the program which performs feature selection by making thecomputer perform the steps of: preparing a predetermined number of setsfor causing the learning patterns to transition according to a pluralityof feature values; deciding a predetermined feature candidate having alargest amount of mutual information from class information of the setof the learning patterns, as a first feature of the feature set, bycalculating the feature value of each of the learning patternscorresponding to the feature candidates, adding a weight to each of thelearning patterns according to the decided feature, distributing thelearning patterns, and causing the learning patterns to transition to aset corresponding to the decided feature in sequence, and decidingpredetermined feature candidates having a largest amount of mutualinformation between the feature value of the learning patterncorresponding to the feature candidate and the class information of thelearning pattern in sequence, as next features of the feature set, underthe condition that information about die set including the learningpatterns and the decided features are known; and calculating aprobability that each of the input patterns has predetermined classinformation based on a route of the transition, whereby the inputpatterns are judged based on a result of the calculation.
 24. A patternlearning system for maintaining the program according to claim 12 sothat the programs can be run, so as to perform learning-pattern-featureselection.
 25. A pattern classification system for maintaining theprogram according to claim 18 so that the programs can be run, so as toperform learning-pattern classification.
 26. A pattern judgment systemfor maintaining the program according to claim 19 so that the programscan be run, so as to perform input-pattern judgment.
 27. The featureselection method according to claim 1, wherein the feature candidatesgenerated by the feature-candidate generation include a featurecandidate using a complex Gabor function, as a feature extractionfunction.
 28. The feature selection method according to claim 1, whereinthe feature candidates generated by the feature-candidate generationincludes a feature candidate obtained from a feature extraction functionobtained by normalizing a complex Gabor function.
 29. The featureselection method according to claim 1, wherein the feature decisionperforms an operation for each of the learning patterns by using a noiseparameter determined for each of the feature candidates.
 30. The featureselection method according to claim 1, wherein the feature decisioncalculates a probability that a feature of each of the learning patternshas a predetermined value, as the feature value of each of the learningpatterns corresponding to the feature candidates.
 31. A method forclassifying patterns used for a system including learning-patternstorage for storing and maintaining the learning patterns used forlearning, feature-candidate generation for generating a plurality offeature candidates, feature decision for deciding a feature set suitablefor pattern identification from among the plurality of featurecandidates generated by the feature-candidate generation, featurestorage for storing and maintaining the feature set decided by thefeature decision, and classification-table generation for generating aclassification table, wherein the classification-table generationcalculates the feature value of each of the learning patterns by usingthe feature set decided according to the feature selection method inclaim 1, and classifies the learning patterns according to theclassification table including the feature values of the learningpatterns and class information.
 32. The method for classifying learningpatterns according to claim 31, wherein where the learning patterns canbe classified irrespective of the feature values, theclassification-table generation provides a redundant item in place ofthe feature value in the classification table at a positioncorresponding thereto.
 33. A method for pattern judgment used for asystem including pattern input for inputting patterns, featureextraction for extracting features from the patterns, pattern judgmentfor judging the patterns based on the features, and feature storage forstoring and maintaining a decided feature set, wherein the featureextraction calculates the feature value of each of the input patterns byusing the feature set decided by the feature selection method accordingto claim 1, and performs the pattern judgment based on a result of thecalculation.
 34. The method for pattern judgment according to claim 33wherein the pattern judgment performs the pattern judgment performs thepattern judgment by using the classification table obtained by storingand maintaining the learning patterns used for learning,feature-candidate generation for generating a plurality of featurecandidates, feature decision for deciding a feature set suitable forpattern identification from among the plurality of feature candidatesgenerated by the feature-candidate generation, feature storage forstoring and maintaining the feature set decided by the feature decision,wherein where the learning patterns can be classified irrespective ofthe feature values, the classification-table generation provides aredundant item in place of the feature value in the classification tableat a position corresponding thereto.
 35. The method for pattern judgmentaccording to claim 34, wherein each of the feature values of the inputpatterns indicates a value of a probability that a feature of aconcerned rank becomes a predetermined value, wherein the patternjudgment performs judgment by calculating a probability that eachfeature pattern included in the classification table becomes a value ofa predetermined class information by using the feature values.
 36. Amethod for pattern judgment used for a system including pattern inputfor inputting patterns, feature extraction for extracting features fromthe patterns, pattern judgment for judging the patterns based on thefeatures, and feature storage for storing and maintaining a decidedfeature set, wherein the feature extraction calculates the feature valueof each of the input patterns by using the feature set decided accordingto the feature selection method according to claim 1, wherein thepattern judgment causes the input patterns to transition based on thefeature values of the input patterns and a transition table that storessets to which the learning patterns belong at the time where eachfeature of the feature set obtained by the feature selection methodaccording to claim 1 is decided in sequence, and wherein the patternjudgment is performed based on the sets to which the input patternsbelong, as a result of the transition.
 37. A method for pattern judgmentused for a system including pattern input for inputting patterns,feature extraction for extracting features from the patterns, patternjudgment for judging the patterns based on the features, and featurestorage for storing and maintaining a decided feature set, wherein thefeature extraction calculates a feature probability for each of theinput patterns, the feature probability indicating a probability thatthe feature of a concerned rank becomes a predetermined value by usingthe feature set determined by the feature selection method according toclaim 1, wherein the pattern judgment causes the input patterns totransition based on the feature probabilities of the input patterns anda transition table that stores sets to which learning patterns belong atthe time where each feature is decided in sequence according to thefeature selection method according to claim 1, and performs the patternjudgment by calculating a probability that each of the input patternshas predetermined class information based on a route of the transition.38. The program according to claim 13, wherein the feature candidatesinclude a feature candidate using a complex Gabor function, as a featureextraction function.
 39. The program according to claim 13, wherein thefeature candidates generated by the feature-candidate generation includea feature candidate obtained from a feature extraction function obtainedby normalizing a complex Gabor function.
 40. The program according toclaim 13, wherein the program further makes the computer perform anoperation for each of the learning patterns by using a noise parameterdetermined for each of the feature candidates.
 41. The program accordingto claim 13, wherein the computer calculates a probability that afeature of each of the learning patterns has a predetermined value, asthe feature value of each of the learning patterns corresponding to thefeature candidates.
 42. A program run by a computer forming a systemincluding learning-pattern storage for storing and maintaining learningpatterns used for learning, feature-candidate generation for generatinga plurality of feature candidates, feature decision for deciding a setof features suitable for pattern identification from among the pluralityof feature candidates generated by the feature-candidate generation,feature storage for storing and maintaining the feature set decided bythe feature decision, and classification-table generation for generatinga classification table, wherein the program performs learning-patternclassification by making the computer perform the steps of: calculatinga feature of each of the learning patterns by using the feature setdecided by running the program according to claim 13; and classifyingthe learning patterns according to the classification table includingthe feature values of the learning patterns and class information. 43.The program according to claim 42, wherein the learning patterns can beclassified irrespective of the feature values, a redundant item isprovided in place of the feature value in the classification table at aposition corresponding thereto.
 44. A program run by a computer forminga system including pattern input for inputting patterns, featureextraction for extracting features from the patterns, pattern judgmentfor judging the patterns based or the features, and feature storage forstoring and maintaining a decided feature set wherein the program makesthe computer perform a step of calculating the feature value of each ofthe input patterns by using the feature set decided by running theprogram according to claim 13, whereby pattern judgment is performedbased on the performance result.
 45. The program according to claim 44wherein the computer performs the pattern judgment by using aclassification table obtained by running the program which performslearning-pattern classification by making the computer perform the stepsof: calculating a feature of each of the learning patterns by using thefeature set decided by running the program; which performs featureselection by making the computer perform the steps of: preparing apredetermined number of sets for causing the learning patterns totransition according to a plurality of feature values; deciding apredetermined feature candidate having a largest amount of mutualinformation from class information of the set of the learning patterns,as a first feature of the feature set, by calculating the feature valueof each of the learning patterns corresponding to the featurecandidates, adding a weight to each of the learning patterns accordingto the decided feature, distributing the learning patterns, and causingthe learning patterns to transition to set corresponding to the decidedfeature in sequence, and deciding predetermined feature candidateshaving a largest amount of mutual information between the feature valueof the learning pattern corresponding to the feature candidate and theclass information of the learning pattern in sequence, as next featuresof the feature set, under the condition that information about the setincluding the learning patterns and the decided features are known; andclassifying the learning patterns according to the classification tableincluding the feature values of the learning patterns and classinformation.
 46. The program according to claim 45, wherein each of thefeature values of the input patterns calculated by the computerindicates a value of a probability that a feature of a concerned rankbecomes a predetermined value, wherein the computer performs judgment bycalculating a probability that each feature pattern included in theclassification table becomes a value of predetermined class informationby using the feature values.
 47. A program run by a computer forming asystem including pattern input for inputting patterns, featureextraction for extracting features from the patterns, pattern judgmentfor judging the patterns based on the features, and feature storage forstoring and maintaining a decided feature set, wherein the programjudges the input patterns based on a set to which the input patternsbelong by making the computer perform the steps of: calculating thefeature value of each of the input patterns by using the feature setdecided by running the program according to claim 13; and causing theinput patterns to transition based on the feature values of the inputpatterns and a transition table that stores sets to which the learningpatterns belong at the time where each feature is decided in sequence byrunning the program according to claim
 13. 48. A program run by acomputer forming a system including pattern input for inputtingpatterns, feature extraction for extracting features from the patterns,pattern judgment for judging the patterns based on the features, andfeature storage for storing and maintaining a decided feature set,wherein the program makes the computer perform the steps of: calculatinga feature probability for each of the input patterns, the featureprobability indicating a probability that the feature of a concernedrank becomes a predetermined value by using the feature set decided byrunning the program according to claim 13; causing the input patterns totransition based on the feature probabilities of the input patterns anda transition table that stores sets to which learning patterns belong atthe time where each feature is decided in sequence by running theprogram according to claim 13: and calculating a probability that eachof the input patterns has predetermined class information based on aroute of the transition, whereby the input patterns are judged based ona result of the calculation.
 49. A pattern learning system formaintaining the program according to claim 13 so that the programs canbe run, so as to perform learning-pattern-feature selection.
 50. Apattern classification system for maintaining the program according toclaim 42 so that the programs can be run, so as to performlearning-pattern classification.
 51. A pattern classification system formaintaining the program according to claim 43 so that the programs canbe run, so as to perform learning-pattern classification.
 52. A patternjudgment system for maintaining the program according to claim 44 sothat the programs can be run, so as to perform input-pattern judgment.53. A pattern judgment system for maintaining the program according toclaim 45 so that the programs can be run, so as to perform input-patternjudgment.
 54. A pattern judgment system for maintaining the programaccording to claim 43 so that the programs can be run, so as to performinput-pattern judgment.
 55. A pattern judgment system for maintainingthe program according to claim 46 so that the programs can be run, so asto perform input-pattern judgment.
 56. A pattern judgment system formaintaining the program according to claim 47 so that the programs canbe run, so as to perform input-pattern judgment.
 57. A pattern judgmentsystem for maintaining the program according to claim 48 so that theprograms can be run, so as to perform input-pattern judgment.
 58. Aprogram run by a computer forming a system including learning-patternstorage for storing and maintaining learning patterns used for learning,feature-candidate generation for generating a plurality of featurecandidates, feature decision for deciding a set of features suitable forpattern identification from among the plurality of feature candidatesgenerated by the feature-candidate generation, feature storage forstoring and maintaining the feature set decided by the feature decision,and classification-table generation for generating a classificationtable, wherein the program performs learning-pattern classification bymaking the computer perform the steps of: calculating a feature of eachof the learning patterns by using the feature set decided by running theprogram according to claim 13; and classifying the learning patternsaccording to the classification table including the feature values ofthe learning patterns and class information.
 59. A feature selectionmethod according to claim 1, wherein the feature decision method furthercomprises the steps of: preparing two working sets and the predeterminednumber of sets each labeled with a predetermined numeric value forcausing the learning patterns to transition according to feature values;distributing each learning pattern to the two working sets with weightsdetermined according to the determined feature, and computes theproportion of target class learning patterns for each working set;causing the learning patterns in each working set to transition to a setlabeled with a numeric value nearest to the proportion; deciding apredetermined feature candidate having a largest amount of informationfrom the class information under the condition that information aboutdistribution of the learning patterns over the sets and the featurealready decided are known as the second feature; for each of the sets,distributing each of the learning patterns within the set to the twoworking sets with weights determined according to the determinedfeature, thereby computing the proportion of target-class-learningpatterns for each working set; and deciding a predetermined featurecandidate having a largest amount of mutual information from the classinformation under the condition that information about the distributionof the learning patterns over the sets and the features already decidedare known as a next feature in the sequence.